Graph data management systems have become very popular\udas graphs are the natural data model for many applications.\udOne of the main problems addressed by these systems is subgraph\udquery processing; i.e., given a query graph, return all\udgraphs that contain the query. The naive method for processing\udsuch queries is to perform a subgraph isomorphism\udtest against each graph in the dataset. This obviously does\udnot scale, as subgraph isomorphism is NP-Complete. Thus,\udmany indexing methods have been proposed to reduce the\udnumber of candidate graphs that have to underpass the subgraph\udisomorphism test. In this paper, we identify a set of\udkey factors-parameters, that influence the performance of\udrelated methods: namely, the number of nodes per graph,\udthe graph density, the number of distinct labels, the number\udof graphs in the dataset, and the query graph size. We then\udconduct comprehensive and systematic experiments that analyze\udthe sensitivity of the various methods on the values of\udthe key parameters. Our aims are twofold: first to derive\udconclusions about the algorithms’ relative performance, and,\udsecond, to stress-test all algorithms, deriving insights as to\udtheir scalability, and highlight how both performance and\udscalability depend on the above factors. We choose six wellestablished\udindexing methods, namely Grapes, CT-Index,\udGraphGrepSX, gIndex, Tree+∆, and gCode, as representative\udapproaches of the overall design space, including the\udmost recent and best performing methods. We report on\udtheir index construction time and index size, and on query\udprocessing performance in terms of time and false positive\udratio. We employ both real and synthetic datasets. Specifi-\udcally, four real datasets of different characteristics are used:\udAIDS, PDBS, PCM, and PPI. In addition, we generate a\udlarge number of synthetic graph datasets, empowering us to\udsystematically study the algorithms’ performance and scalability\udversus the aforementioned key parameters.
展开▼